National Repository of Grey Literature 8 records found  Search took 0.01 seconds. 
Movie Recommender System
Janko, Pavel ; Zbořil, František (referee) ; Šůstek, Martin (advisor)
This thesis primarily addresses various methods of constructing a system for movie recommendations. Both basic and advanced techniques required for creating a recommender system are also covered in the thesis. The core of the thesis is designing, implementing and experimenting with a system for movie recommendations based upon the data originating from publicly accessible datasets. In order to predict ratings that the user would give to movies after watching them, the system utilizes a factorization model based on collaborative filtering. This thesis also describes the relation between model hyperparameter configuration and prediction accuracy, experiments that were conducted in order to further improve the model accuracy and finally compares the implemented model with existing solutions.
Improving Consistency in Text Recognition Datasets
Tvarožný, Matúš ; Hradiš, Michal (referee) ; Kišš, Martin (advisor)
This work is concerned with increasing the consistency of datasets for text recognition. This paper describes the problems that cause the inconsistency and then presents solutions to eliminate it. The effect of the properties of the polygons defining the text line boundaries and hence how the modified version of the dataset, which is composed of ideal text line variants, affected the accuracy of the model is investigated. Further, the work focuses on detecting and then removing or modifying text lines whose ground truth transcription does not match the actual text they contain. Experimentation showed that removing the visual inconsistency on the training set did not have a significant effect on the trained model, but modifying the test set improved the OCR accuracy of the model by 1.1\% CER. By modifying the dataset so that it did not contain mutually inconsistent pairs of recognized text and the corresponding ground truth, the model improved by a maximum of only 0.2\% CER after re-training. The main finding of this work is, above all, the proven beneficial effect of removing inconsistencies on test suites, thanks to which it is possible to determine a more realistic error rate of the OCR model.
Analysis of Mobile Devices Network Communication Data
Abraham, Lukáš ; Bartík, Vladimír (referee) ; Burgetová, Ivana (advisor)
At the beginning, the work describes DNS and SSL/TLS protocols, it mainly deals with communication between devices using these protocols. Then we'll talk about data preprocessing and data cleaning. Furthermore, the thesis deals with basic data mining techniques such as data classification, association rules, information retrieval, regression analysis and cluster analysis. The next chapter we can read something about how to identify mobile devices on the network. We will evaluate data sets that contain collected data from communication between the above mentioned protocols, which will be used in the practical part. After that, we finally get to the design of a system for analyzing network communication data. We will describe the libraries, which we used and the entire system implementation. We will perform a large number of experiments, which we will finally evaluate.
Data Sets for Network Security
Setinský, Jiří ; Hranický, Radek (referee) ; Tisovčík, Peter (advisor)
In network security, machine learning techniques are used to effectively detect anomalies and malware in network traffic. A quality dataset is needed to train a network classifier with high accuracy. The aim of this paper is to modify the dataset using machine learning techniques to improve the quality of the dataset which will lead to training the model with a higher accuracy. The dataset is analyzed by a clustering algorithm and each cluster is characterized by a statistical description resulting from the attributes of the input dataset. The statistical description along with the information of the original classifier is used to compute the score. The score serves as a weight in the modification phase. Cluster analysis allows to filter out the data that are important for training the final model. The proposed approach allows us to mitigate the redundancy of the dataset or to augment it with missing data. The result is a modification framework that is able to reduce the datasets or perform their aggregation in order to create a compact dataset that reflects the actual network traffic. Models were trained on the created datasets and achieved higher accuracy compared to the existing solution.
Model of Cycling Traffic Intensity in Brno
Eliáš, Radoslav ; Burget, Radek (referee) ; Hynek, Jiří (advisor)
Oddelenie dát v Brne má prístup k viacerým dátovým sadám o počtoch cyklistov. Cieľom práce bolo vytvoriť model integrujúci tieto zdroje pre odbor dopravy magistrátu mesta, aby získali prehľad o tom, ako sa infraštruktúra denne využíva. Každý súbor údajov je agregovaný na inú základnú mapu s mierne odlišnou sieťou ulíc. Táto práca predstavuje algoritmický prístup k porovnávaniu ulíc na základe podobnosti, percentuálneho prekrytia a dalších parametrov. Poskytnuté sú dva algoritmy na porovnávanie geometrie založenej na bodoch a úseckách geometrie. Rovnako aj model priraďujúci lokácie medzi rôznymi dátovými sadami a informačný panel vizualizujúci hodnoty z nich vedľa seba. Robustnosť algoritmov umožnuje ich použitie v akejkoľvek geografickej aplikácii využívajúcej priestorové údaje. Informačný panel poskytuje užitocné informácie o cyklistickej doprave pre bežných používatelov aj odborníkov, ktorí navrhujú infraštruktúru mesta Brna.
Improving Consistency in Text Recognition Datasets
Tvarožný, Matúš ; Hradiš, Michal (referee) ; Kišš, Martin (advisor)
This work is concerned with increasing the consistency of datasets for text recognition. This paper describes the problems that cause the inconsistency and then presents solutions to eliminate it. The effect of the properties of the polygons defining the text line boundaries and hence how the modified version of the dataset, which is composed of ideal text line variants, affected the accuracy of the model is investigated. Further, the work focuses on detecting and then removing or modifying text lines whose ground truth transcription does not match the actual text they contain. Experimentation showed that removing the visual inconsistency on the training set did not have a significant effect on the trained model, but modifying the test set improved the OCR accuracy of the model by 1.1\% CER. By modifying the dataset so that it did not contain mutually inconsistent pairs of recognized text and the corresponding ground truth, the model improved by a maximum of only 0.2\% CER after re-training. The main finding of this work is, above all, the proven beneficial effect of removing inconsistencies on test suites, thanks to which it is possible to determine a more realistic error rate of the OCR model.
Movie Recommender System
Janko, Pavel ; Zbořil, František (referee) ; Šůstek, Martin (advisor)
This thesis primarily addresses various methods of constructing a system for movie recommendations. Both basic and advanced techniques required for creating a recommender system are also covered in the thesis. The core of the thesis is designing, implementing and experimenting with a system for movie recommendations based upon the data originating from publicly accessible datasets. In order to predict ratings that the user would give to movies after watching them, the system utilizes a factorization model based on collaborative filtering. This thesis also describes the relation between model hyperparameter configuration and prediction accuracy, experiments that were conducted in order to further improve the model accuracy and finally compares the implemented model with existing solutions.
Analysis of Mobile Devices Network Communication Data
Abraham, Lukáš ; Bartík, Vladimír (referee) ; Burgetová, Ivana (advisor)
At the beginning, the work describes DNS and SSL/TLS protocols, it mainly deals with communication between devices using these protocols. Then we'll talk about data preprocessing and data cleaning. Furthermore, the thesis deals with basic data mining techniques such as data classification, association rules, information retrieval, regression analysis and cluster analysis. The next chapter we can read something about how to identify mobile devices on the network. We will evaluate data sets that contain collected data from communication between the above mentioned protocols, which will be used in the practical part. After that, we finally get to the design of a system for analyzing network communication data. We will describe the libraries, which we used and the entire system implementation. We will perform a large number of experiments, which we will finally evaluate.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.